220 ◾ Bioinformatics
-U data/ENCFF000XGP_inp0.fastq.gz \
-S bam/ENCFF000XGP_inp0.sam \
2> bam/inp0.log
bowtie2 \
-p 4 \
-x ref/hg19 \
-U data/ENCFF000XJP_chp1.fastq.gz \
-S bam/ENCFF000XJP_chp1.sam \
2> bam/chp1.log
bowtie2 \
-p 4 \
-x ref/hg19 \
-U data/ENCFF000XJS_chp2.fastq.gz \
-S bam/ENCFF000XJS_chp2.sam \
2> bam/chp2.log
bowtie2 \
-p 4 \
-x ref/hg19 \
-U data/ENCFF000XKD_chp3.fastq.gz \
-S bam/ENCFF000XKD_chp3.sam \
2> bam/chp3.log
The four SAM files produced by the above commands contain the alignment information
of the reads. However, they may also include alignment information that we do not need
and removing that will make us focus only on the regions of interest and also reduce the
computational complexity. We can remove the mitochondrion read alignments, which are
defined as “chrM” in the chromosome field of the SAM file and the unidentified, ran-
dom, and haploid reads, which are defined as “chrUn”, “random”, and “*hap*”, respectively,
keeping only the reads aligned to the human chromosomes. We can use “sed” Linux com-
mand to do that and the filtered alignments are saved in new files.
cd bam
sed ‘/chrM/d;/random/d;/chrUn/d;/hap/d’ ENCFF000XGP_inp0.sam >
ENCFF000XGP_inp0_filt.sam
sed ‘/chrM/d;/random/d;/chrUn/d;/hap/d’ ENCFF000XJP_chp1.sam >
ENCFF000XJP_chp1_filt.sam
sed ‘/chrM/d;/random/d;/chrUn/d;/hap/d’ ENCFF000XJS_chp2.sam >
ENCFF000XJS_chp2_filt.sam
sed ‘/chrM/d;/random/d;/chrUn/d;/hap/d’ ENCFF000XKD_chp3.sam >
ENCFF000XKD_chp3_filt.sam
We can then convert the SAM files into BAM files using “samtools view” command.
samtools view -S -b ENCFF000XGP_inp0_filt.sam > ENCFF000XGP_inp0_
filt.bam